First, generate data:
set.seed(0)
datobj = generate_data_generic(p=5, TT=300, fac=.5, nt=2000, dimdat = 3)
ylist = datobj$ylist
X = datobj$X
This produces three dimensional cytograms ylist and covariates X.
ylist is a list of length \(T=300\), the number of time points (or cytograms). Each element of ylist is an array with \(d=3\) rows (a single cytogram) and \(n_t\) columns. The number of columns \(n_t\) of each element in ylist can be different.
X is a \(T \times d\) matrix, whose \(t\)’th rows contain the relevant (environmental) factors of the \(t\)’th cytogram.
plot(ylist[[1]][,1:2], ylab="", xlab="", pch=16, col=rgb(0,0,1,0.2), cex=.5)
Especially if your data is a time series, it could be useful to plot the covariates once across time.
matplot(X, type = 'l')
Now, we estimate the data with fixed regularization parameters \(\lambda_\alpha=0.01\) and \(\lambda_\beta=0.01\), and \(K=10\) clusters.
Internally, flowmix() repeats the estimation 5 (the default) times, and returns the estimated model providing the best data fit.
numclust = 4
set.seed(0)
res = flowmix(ylist = ylist, X = X, numclust = numclust,
mean_lambda = 0.001, prob_lambda = 0.001,
nrep = 1)
print(res)
## [1] "47 out of 60 regression coefficients for the cluster centers, are zero."
## [1] "16 out of 20 regression coefficients for the cluster probabilities, are zero."
## [1] "The algorithm converged in 19 EM iterations."
The cluster probabilities look like this:
plot_prob(res)
Showing the model across time, in an animation:
par(mfrow = c(1,3), oma = c(2,2,2,2))
ylim = c(-3,8)
xlim = c(-5,8)
for(tt in 1:res$TT){
for(dims in list(c(1,2), c(2,3), c(3,1))){
scatterplot_2d(ylist, res, tt, dims = dims, cex_fac=1, ylim=ylim, xlim=xlim)
}
mtext(outer = TRUE,
text = paste0("t=", tt, " out of ", res$TT),
cex = 2)
}